Insights into Spoken Language Gleaned from Phonetic Transcription of the Switchboard Corpus

نویسندگان

  • Steven Greenberg
  • Joy Hollenback
  • Dan Ellis
چکیده

Models of speech recognition (by both human and machine) have traditionally assumed the phoneme to serve as the fundamental unit of phonetic and phonological analysis. However, phoneme-centric models have failed to provide a convincing theoretical account of the process by which the brain extracts meaning from the speech signal and have fared poorly in automatic recognition of natural, informal speech (e.g., the Switchboard corpus). Over the past five months the Switchboard Transcription Project has phonetically transcribed a portion of the Switchboard corpus in an effort to better understand the failure of phoneme-centric models for machine recognition of speech, as well as to provide a database through which to improve the performance of recognition systems focused on conversational dialogs. Transcription of spoken dialogs illustrates the pitfalls of a phoneme-based system. Many words are articulated in such a fashion as to either omit or significantly transform the phonetic properties of phonemic constituents, thus resulting in wide variation of word pronunciations. Often, only the barest hint of a segment is realized phonetically, in spite of good intelligibility. Despite this large variability in phonetic realization of words, the temporal properties of speech segments, both phones and syllables, appear to conform to regular patterns. This temporal regularity suggests that much of the linguistic information in speech may be signaled through variations in amplitude, pitch and the coarse spectrum, and that such patterns may be useful in the design of future-generation speech recognition systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Usefulness of Large Spoken Language Corpora for Linguistic Research

In the past, fundamental linguistic research was typically conducted on small data sets that were handcrafted for the specific research at hand. However, from the eighties onwards, many large spoken language corpora have become available. This study investigates the usefulness of large multi-purpose spoken language corpora for fundamental linguistic research. A research task was designed in whi...

متن کامل

The Phonetic Patterning of Spontaneous American English Discourse

Statistical analysis of a manually annotated, 45-minute subset of the SWITCHBOARD corpus indicates that pronunciation variation observed in spontaneous American English discourse is highly structured at the level of the syllable, particularly when prosodic stress accent (i.e., syllable prominence) is taken into account. The pattern of segmental substitutions and deletions observed are largely a...

متن کامل

Automatic Tools for Analyzing Spoken Hebrew

This work summarizes our project to propose a set of automatic tools for analyzing the phonetic and phonological content of spoken Hebrew. The goal of the project is to provide a set of resources to scientists and engineers who work on research and engineering problems related to the acoustics and linguistics of the modern Hebrew language. The set of tools includes: (i) a transcribed corpus of ...

متن کامل

ProPOSEC: A Prosody and PoS Annotated Spoken English Corpus

We have previously reported on ProPOSEL, a purpose-built Prosody and PoS English Lexicon compatible with the Python Natural Language ToolKit. ProPOSEC is a new corpus research resource built using this lexicon, intended for distribution with the Aix-MARSEC dataset. ProPOSEC comprises multi-level parallel annotations, juxtaposing prosodic and syntactic information from different versions of the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996